arXiv: http://arxiv.org/abs/1508.05508
... The most interesting thing here was how they handled differently sized samples (as of now, I haven't read "Memory Networks" yet). Note that for fact reasoning we may have different numbers of facts, each one as a sentence of arbitrary size. They handled that by cascading DNNs, RNNs and/or pooling operations to end up with a fixed size vector that could be utlimately classified or used as a seed to a final sequence generator RNN. The stack also does "sensor fusion" between query and facts at each level.
In [ ]: